Computing Intensions of Digital Library Collections

نویسندگان

  • Carlo Meghini
  • Nicolas Spyratos
چکیده

We model a Digital Library as a formal context in which objects are documents and attributes are terms describing documents contents. A formal concept is very close to the notion of a collection: the concept extent is the extension of the collection; the concept intent consists of a set of terms, the collection intension. The collection intension can be viewed as a simple conjunctive query which evaluates precisely to the extension. However, for certain collections no concept may exist, in which case the concept that best approximates the extension must be used. In so doing, we may end up with a too imprecise concept, in case too many documents denoted by the intension are outside the extension. We then look for a more precise intension by exploring 3 different query languages: conjunctive queries with negation; disjunctions of negationfree conjunctive queries; and disjunctions of conjunctive queries with negation. We show that a precise description can always be found in one of these languages for any set of documents. However, when disjunction is introduced, uniqueness of the solution is lost. In order to deal with this problem, we define a preferential criterion on queries, based on the conciseness of their expression. We then show that minimal queries are hard to find in the last 2 of the three languages above.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coping with very large digital collections using Greenstone

The Greenstone digital library software is widely used for small to medium digital library collections, but its reputation for creating very large collections is less well established. This paper describes how Greenstone is being used to produce large newspaper collections for the National Libraries of New Zealand and Singapore, respectively. It also describes current developments that integrat...

متن کامل

A Distributed Digital Library Architecture Incorporating Different Index Styles

The New Zealand Digital Library offers several collections of information over the World Wide Web. Although full-text indexing is the primary access mechanism, musical collections can also be accessed through a novel melody retrieval system. In offering this service over a three-year period, we have had to face many practical challenges in building, maintaining, and administering diverse collec...

متن کامل

Greenstone: open-source digital library software with end-user collection building

The Greenstone digital library software is an open-source system for the construction and presentation of information collections. Collections built with Greenstone offer effective full-text searching and metadata-based browsing facilities that are attractive and easy to use. Moreover, they are easily maintainable and can be augmented and rebuilt entirely automatically. The system is extensible...

متن کامل

Distributed Digital Libraries Platform in the PIONIER Network

The dLibra Digital Library Framework (http://dlibra.psnc.pl/) is a Polish digital library software platform developed by Poznan Supercomputing and Networking Center as a part of the PIONIER programme (http://www.pionier.gov.pl/). The dLibra project was started in 1999, as a part of research in the field of digital libraries started in PSNC in 1996. The developed platform is currently the most p...

متن کامل

An Experiment on Personal Archiving and Retrieving Image System (PARIS)

PARIS (Personal Archiving and Retrieving Image System) is an experiment personal photograph library, which includes more than 80,000 of consumer photographs accumulated within a duration of approximately five years, metadata based on our proposed MPEG-7 annotation architecture, Dozen Dimensional Digital Content (DDDC), and a relational database structure. The DDDC architecture is specially desi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007